WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




per 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 5 : 
G06F 15/80 



Al 



(11) International Publication Number: 
(43) International Publication Date: 



WO 93/04438 

4 March 1993 (04.03.93) 



(21) International Application Number: PCT/US92/06848 

(22) International Filing Date: 13 August 1992 (13.08.92) 



(30) Priority data: 
746,038 



16 August 1991 (16.08.91) US 



(71) Applicant: THINKING MACHINES CORPORATION 

[US/US]; 245 First Street, Cambridge, MA 02142 (US). 

(72) Inventors: WELLS, David, S. ; 39 Bear Hill Road, Bolton, 

MA 01740 (US). ROWE, Eric, L. ; 140 Mill Street, Na- 
tick, MA 01760 (US). ISMAN, Marshall ; 11 Valley 
Spring Road, Newton, MA 02158 (US). 

(74) Agent: JORDAN, Richard, A.; Thinking Machines Cor- 
poration, 245 First Street, Cambridge, MA 02142 (US). 



(81) Designated States: AU, CA, JP, European patent (AT, BE, 
CH, DE, DK, ES, FR, GB, GR, IE, IT, LU, MC, NL, 
SE). 



Published 

With international search report 



(54) Title: IN7UT/OUTPUT ARRANGEMENT FOR MASSIVELY PARALLEL COMPUTER SYSTEM 




A computer comprising a plurality of processing elements (11) and an input/output processor (13) interconnected by a 
routing network (15). The routing network (15) transfers messages between the processing elements (11) and the input/output 
processor (13). The processing elements (11) perform processing operations in connection with data received from the input/out- 
put processor in messages transferred over the routing network and transferring processed data to the input/output processor in 
messages over the routing network, the processing elements being connected as a first selected series of leaf nodes. The input/out- 
put processor includes a plurality of input/output buffers connected as a second selected series of leaf nodes of the routing ne- 
twork for generating messages for transfer over the routing network to a series of processing elements forming at least a selected 
subset of the processing elements during an input/output operation. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCI' on the front pages of pamphlets publishing international 
applications under the POT. 



AT 


Austria 


Ft 


Finland 


MN 


Mongolia 


AU 


Australia 


FR 


France 


MR 


Mauritania 


BB 


Barbados 


CA 


Gabon 


MW 


Malawi 


BE 


Belgium 


CB 


United Kingdom 


NL 


Netherlands 


BF 


Burkina Faso 


CN 


Guinea 


NO 


Norway 


BC 


Bulgaria 


CR 


Grave 


NZ 


New Zealand 


BJ 


Benin 


HU 


Hungary 


PL 


Poland 


BR 


Brazil 


IE 


Ireland 


PT 


Portugal 


CA 


Canada 


IT 


duly 


RO 


Romania 


CF 


C-enlral African Republic 


JP 


Japan 


RU 


Russian Federation 


CG 


Congo 


KP 


Democratic People'* Republic 


SO 


Sudan 


CH 


Switzerland 




of Korea 


SE 


Sweden 


CI 


Cdtc d'lvoire 


KR 


Republic of Korea 


SK 


Slovak Republic 


CM 


Cameroon 


LI 


Liechtenstein 


SN 


Senegal 


CS 


Czechoslovakia 


LK 


Vi Lanka 


SU 


Soviet Union 


CZ 


Oech Republic 


LU 


Luxembourg 


TO 


Chad 


DE 


Germany 


MC 


Monaco 


TC 


Togo 


DK 


Denmark 


MC 


Madagascar 


UA 


Ukraine 


ES 


Spain 


Ml 


Mali 


US 


United States of America 



WO 93/04438 



PCT/US92/06848 



-1- 

INPUT/OUTPUT ARRANGEMENT FOR 
MASSIVELY PARALLEL COMPUTER SYSTEM 

1 BACKGROUND OF THE INVENTION 

2 The invention relates generally to the field of digital computer systems, and more 

3 particularly to massively parallel computing systems. 

4 A digital computer system generally comprises three basic elements, namely, a memory 

5 element, an input/output element and a processor element. The memory element stores information 

6 in addressable storage locations. This information includes data and instructions for processing the 

7 data. The processor element fetches information from the memory element, interprets the 

8 information as either an instruction or data, processes the data in accordance with the instructions, 

9 and returns the processed data to the memory element. The input/output element, under control of 

10 the processor element, also communicates with the memory element to transfer information, 

11 including instructions and the data to be processed, to the memory, and to obtain processed data 

12 from the memory. 

13 Recently, computers have been developed which incorporate a large number of processing 

14 elements all of which may operate concurrently on generally the same instruction stream, but with 

15 each processing element processing a separate data stream. These processors have been termed 

16 "SIMD" processors, for "^gle-instructioii/multiple-data" or, more generally "SPMD" processors, for 

17 "single-Erograno/multiple-data'' (collectively referred to herein as "SPMD") 

18 SPMD processors are useful in a number of applications, such as image processing, signal 



19 processing, artificial intelligence, database operations, and computer simulation of a number of 

20 thing?, such as electronic circuits and fluid dynamics. In image processing, each processing element 

21 may be used to perform processing on a pixel ("picture element") of the image to enhance the overall 

22 image. In signal processing, the processors concurrently perform a number of the calculations 

23 required to perform such computations as the "Fast Fourier transform" of the data defining the 

24 signal. In artificial intelligence, the processors perform searches on extensive rule bases representing 

25 the stored knowledge of the particular application. Similarly, in database operations, the processors 

26 perform searches on the data in the database, and may also perform sorting and other operations. In 

27 computer simulation of, for example, electronic circuits, each processor may represent one part of 

28 the circuit, and the processor's iterative computations indicate the response of the part to signals 

29 from other parts of the circuit. Similarly, in simulating fluid dynamics, which can be useful in a 

30 number of applications such as weather predication and airplane design, each processor is associated 

31 with one point in space, and the calculations provide information about various factors such as fluid 

32 flow, temperature, pressure and so forth. 

33 Typical SPMD systems include a SPMD array, which includes the array of processing 

34 elements and a router network, a control processor and an input/output component. The 
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1 input/output component, under control of the control processor, enables data to be transferred into 

2 the array for processing and receives processed data from the array for storage, display, and so forth. 

3 The control processor also controls the SPMD array, iterativery broadcasting instructions to the 

4 processing elements for execution in parallel. The router network enables the processing elements 

5 to communicate the results of a calculation to other processing elements for use in future 

6 calculations. 

7 A deficiency in many types of computers having a number of processors, including SPMD 

8 computers, has been in the ability to quickly transfer data and other information between the 

9 input/output element and the processors. 

10 SUMMARY OF THE INVENTION 

H The invention provides a parallel computer system including a new and improved 

12 input/output arrangement. 

13 In brief summary, the invention in one aspect provides a computer comprising a plurality of 

14 processing elements and an input/output processor interconnected by a routing network. The 

15 routing network transfers messages between the processing elements and the input/output processor. 

16 The processing elements perform processing operations in connection with data received from the 

17 input/output processor in messages transferred over the routing network and transferring processed 

18 data to the input/output processor in messages over the routing network, the processing elements 

19 being connected as a first selected series of leaf nodes. Tne input/output processor includes a 

20 plurality of input/output buffers connected as a second selected series of leaf nodes of the routing 

21 network for generating messages for transfer over the routing network to a series of processing 

22 elements forming at least a selected subset of the processing elements during an input/output 

23 operation. 

24 In another aspect, the invention provides an input/output processor including a plurality of 

25 input/output buffers connected to a series of leaf nodes of said routing network for generating 

26 messages for transfer over said routing network to a plurality of data receivers each connected to one 

27 of a second series of nodes of said routing network and identified by an address during an 

28 input/output operation. Each input/output buffer includes a transmit data buffer for buffering a 

29 pluraHty of data items each to be transmitted in a message to a data receiver in a message. A 

30 destination data receiver address and offset generator iterativery generates a destination data 

31 receiver address value and a destination offset value in response to the number of input/output 

32 buffers and the number of data receivers participating in the input/output operation. 

33 BRIEF DESCRIPTION OF THE DRAWINGS 

34 This invention is pointed out with particularity in the appended claims. The above and 

35 further advantages of this invention may be better understood by referring to the following 

36 description taken in conjunction with the accompanying drawings, in which: 
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1 Fig. 1 is a general block diagram of a massively parallel computer system constructed in 

2 accordance with the invention; 

3 Figs. 2 is a diagram depicting the structure of message packets transmitted over the data 

4 router in the computer system depicted in Fig. 1; 

5 Figs. 3A and 3B are functional block diagrams depicting the general structure of selected 

6 portions of the computer system of Fig 1 useful in understanding the invention; 

7 Figs. 4A and 4B are logic diagrams detailing the structure of circuits used in the portion 

8 depicted in Fig. 3A which generate information used in connection with generating portions of the 

9 message packets depicted in Fig. 2. 

10 DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT 

11 Fig. 1 is a general block diagram of a massively parallel computer system 10 constructed in 

12 accordance with the invention. With reference to Fig. 1, system 10 includes a plurality of processing 

13 elements 11(0) through 11(N) (generally identified by reference numeral 11), scalar processors 12(0) 

14 through 12(M) (generally identified by reference numeral 12) and input/output processors 13(0) 

15 through 13(K) (generally identified by reference numeral 13). Input/output units (not shown), such 

16 as, for example, disk and tape storage units, video display devices, printers and so forth may be 

17 connected to the input/output processors to supply information, including data and program 

18 commands, for processing by the processing elements 11 and scalar processors 12 in the system, and 

19 may also receive processed data for storage, display and printing. The scalar processors 12 may also 

20 be connected to input/output units including, for example, video display terminals which permit one 

21 or more operators to generally control system 10. Hie system 10 may also include a plurality of spare 

22 processing elements lls(0) through lls(J) (generally identified by reference numeral lis) which may 

23 be used as described below. 

24 The system 10 further includes a control network 14, a data router 15 and a diagnostic 

25 network 16. The control network 14 permits one or more scalar processors 12 to broadcast program 

26 commands to processing elements 11. The processing elements 11 which receive the commands 

27 execute them generally concurrently. The control network 14 also permit the processing elements 11 

28 to generate status information which they may supply to the scalar processors 12. The control 

29 network 14 is also used by the processing elements 11 to perform selected types of arithmetic 

30 operations, termed "scan* and "reduce" operations. The control network 14 may also be used to 

3 1 provide status and synchronization information among the processing elements 1 1 . 

32 The data router 15 transfers data among the processing elements 11, scalar processors 12 

33 and input/output processors 13. In particular, under control of the scalar processors 12, the 

34 input/output processors 13 retrieve data to be processed from the input/output units and distributes 

35 it to the respective scalar processors 12 and processing elements 11. During processing, the scalar 

36 processors 12 and processing elements 11 can transfer data among themselves over the data router 
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1 15. In addition, the processing elements 11 and scalar processors 12 can transfer processed data to 

2 the input/output processors 13. Under control of the scalar processors 12, the input/output 

3 processors 13 can direct the processed data that they receive from the data router 15 to particular 

4 ones of the input/output units for storage, display, printing or the like. The data router 15 in one 

5 particular embodiment is also used to transfer input/output commands from the scalar processors 12 

6 to the input/output processors 13 and input/output status information from the input/output 

7 processors 13 to the scalar processors 12. 

8 The diagnostic network 16, under control of a diagnostic processor (not shown in Fig. 1), 

9 facilitates testing of other portions of the system 10 to identify, locate and diagnose defects. The 

10 diagnostic processor may comprise one or more of the scalar processors 12. In addition, the 

11 diagnostic network 16 may be used to establish selected operating conditions in the other portions of 

12 the system 10. 

13 The system 10 is synchronous, that is, all of its elements operate in accordance with a global 

14 SYS CLK system clock signal provided by a clock circuit 17. 

15 One particular embodiment of system 10 may include hundreds or many thousands of 

16 processing elements 11 operating on a single problem in parallel under control of commands 

17 broadcast to them by the scalar processors 12. In that embodiment, the processing elements 1 1 

18 operate in parallel on the same command on their individual sets of data, thereby forming a parallel 

19 computer system. 

20 In addition, the system 10 may be dynamically logically partitioned, by logical partitioning of 

21 the control network 14, into multiple logical subsystems which may concurrently operate on separate 

22 problems or separate parts of a single problem. In that case, each partition includes at least one 

23 scalar processor 12 and a plurality of processing elements 11, the scalar processor 12 supplying the 

24 commands for processing by the processing elements in its partition. The spare processing elements 

25 lis, which except for the positions of their connections to the control network 14 and data router 15 

26 are otherwise similar to processing elements 11, may be used to substitute for failed processing 

27 elements 11 in a partition to augment the number of processing elements in a partition if there arc 

28 insufficient processing elements 1 1 to form a partition with a desired number of processing elements 

29 11, or to provide additional processing elements which may themselves be formed into partitions. In 

30 the following unless otherwise stated explicitly, a reference to a processing element 11, in either the 

31 singular or plural, will also be taken as a corresponding singular or plural reference to a spare 

32 processing element lis; that is, the processing elements 11 and spare processing elements lis will be 

33 jointly referred to herein generally as processing elements 11. 

34 Details of a control network 14, data router 15, and diagnostic network 16 used in one 

35 embodiment of the system 10 are described in International Application No. PCT/US91/07383, 

36 International Filing Date 3 October 1991, of Thinking Machines Corporation, entitled Parallel 
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1 Computer System (published under International Publication No. WO 92/06436 on 16 April 1992), 

2 and will not be repeated herein. In brief, both the control network 14 and data router 15 are 

3 generally tree-shaped networks (the data router 15 actually comprising a "fat tree") in which the 

4 processing elements 11, scalar processors 12 and input/output processors 13 are connected at the 

5 leaves. In addition, that International Application describes details of a network interface circuit 

6 included in the processing elements 11, scalar processors 12 and input/output processors 13 to enable 

7 them to communicate over the data router 15 and control network 14, which also will not be 

8 repeated herein. 

9 Hie invention is generally directed to input/output operations in the system 10. Generally, 

10 input/output operations between an input/output processor 13 and processing elements 11 and the 

11 scalar processor 12 of a partition are controlled by the partition's scalar processor 12. Hie scalar 

12 processor 12 provides input/output command information to the processing elements 11 of its 

13 partition and the input/output processors) 13 to engage in the input/output operation. The scalar 

14 processor 12 provides the input/output command information to the input/output processors) 13 

15 over the data router 15, to facilitate sharing of the input/output processors 13 among multiple 

16 partitions. In any case, the input/output command information provided by the scalar processor 12 

17 to both the processing elements 11 and the input/output processor(s) 13 includes, inter alia,- an 

18 operation identifier which identifies the input/output operation. 

19 The aforementioned International Application further describes in detail the structure of 

20 message packets which the processing elements 11, scalar processors 12 and input/output processors 

21 13 transmit over the data router 15 and control network 14 to effect information transfers 

22 thereamong. The invention described herein makes use of message packets transmitted over the 

23 data router 15, in particular input/output message packets having a particular structure which is 

24 depicted in Fig. 2. With reference to Fig. 2, an input/output message packet 2230 includes a message 

25 address portion 31, a message data portion 32 and a check portion 33. The message address portion 

26 is used to identify a path from the transmitting device to the intended recipient. The transmitting 

27 device and the intended recipient may be a processing element 11, a scalar processor 12 or an 

28 input/output processor 13. The message address portion 31 includes a HEADER portion, which 

29 contains a level identifier, and a series of down path identifiers DN T ( index Y is an integer from 

30 "M" to "1"). The level identifier in the HEADER portion identifies the lowest level in the tree that 

31 includes both the transmitting device and the intended recipient, and the data router 15 initially 

32 couples the input/output message packet 2230 from the transmitting device up to that level in the 

33 tree. Thereafter, the data router uses the successive down path identifiers DN Y to steer the 

34 input/output message packet 2230 down the tree to the intended recipient. 

35 Hie message data portion 32 includes a number of fields, including a message length field 34, 

36 a message tag field 35, a destination buffer identifier field 2231, a destination buffer offset field 2232 

37 and a destination data field 2233. The message length field 34 identifies the length of the message 
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1 data portion 34. The message tag field 35 may contain operating system information identifying the 

2 packet as an input/output message packet 2230, from among other types of message packets which 

3 may be transmitted over the data router 15. 

4 The contents of the destination buffer identification portion 2231 and the destination buffer 

5 offset portion 2232 provide information used by the receiving device, for example, a processing 

6 element 11(f) or scalar processor 12, in the case of input/output message packets 2230 transferred 

7 from an input/output processor 13, or by an input/output processor 13 in the case of an input/output 

8 message packet 2230 received thereby from a processing element ll(i) or a scalar processor 12. In 

9 particular, the contents of the destination buffer identification portion 2231 is derived from the 

10 input/output operation identifier, which is provided by the scalar processors 12 in their input/output 

11 commands. For example, if, as is typical, during input/output operations data is received by the 

12 receiver in an input/output buffer maintained thereby, the contents of the destination buffer 

13 identification portion 2231 may be used to identify the particular buffer into which the receiver may 

14 load the contents of the destination data portion 2233. The contents of the destination buffer offset 

15 portion 2232 identifies the particular location in the buffer into which the receiver is to load the 

16 contents of the destination data portion 2233. It will be appreciated that a number of distinct 

17 input/output operations may be performed in system 10 contemporaneously, with the input/output 

18 message packets 2230 having diverse values in their destination buffer identification portions 2231. 

19 In addition, while the particular message transmitter, which may comprise either a 

20 processing element ll(i) or a scalar processor 12, on the one hand, or the input/output processor 13, 

21 on the other hand, may generate and transmit input/output message packets 2230 in the order in 

22 which they have the data to be transmitted, it will be appreciated that the message receivers may 

23 receive the input/output message packets 2230 in random order. The contents of the destination 

24 buffer offset portion 22 of each input/output message packet 2230 enables the receiver to properly 

25 order the data contained in the destination data portions 2233 of the received input/output message 

26 packets 2230 that are associated with the particular input/output operation as indicated by the 

27 contents of their destination buffer identification portions 2231. 

28 Finally, the check portion 33 contains a cyclic redundancy check value which may be used to 

29 verify that the input/output message packet 2230 was correctly received. 

30 Hie invention provides an arrangement for generating information for the message address 

31 portion 31 and destination buffer offset portion 2232 of an input/output message packet 2230. 

32 A brief description of a parallel mode message transfer operation wfll be presented in 

33 connection with Figs. 3A and 3B. These Figs, schematically depict, respectively, a number of 

34 input/output buffer nodes 2201(0) through 2201(6) (Fig. 3A) comprising portions of an input/output 

35 processor 13 participating in an input/output operation with a partition of processing elements 

36 identified by reference numerals 11(0) through 11(5) (Fig. 3B). In particular, Fig. 3A schematically 
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1 represents, for each input/output buffer node 2201(i), a network interface 2277(i), a parallel send 

2 address/offset generator 2278(i) and a parallel mode buffer in the buffer memory 2223(f). Similarly, 

3 Fig. 3B schematically represents the network interface 202(i) and a memory buffer for each 

4 processing element 11(1). The network interfaces 2277 (i) of sequentially-indexed input/output buffer 

5 nodes 2201 (i) are connected as sequential leaves of the data router 15. Similarly, the network 

6 interfaces 202(i) of the sequentially-indexed processing elements 1 1(f) are connected as sequential 

7 leaves of the data router 15. Hie connections to data router 15 of the network interfaces 2277(i) of 

8 the input/output buffer nodes 2201(f) may be adjacent to the connections of the network interfaces 

9 202(i) of the processing elements ll(i)» or they may be separated. The number of input/output 

10 buffer nodes and processing elements participating in an input/output operation will be generally 

11 identified as "N" and "NPE," respectively. 

12 As represented schematically in Fig. 3A, if, during the input/output operation, data is to be 

13 transferred in input/output message packets from an input/output device (not shown) to the 

14 processing elements, a device interface 2202 transfers data to the buffers of the input/output 

15 message buffers 2201(i) on a round-robin basis. That is, the device interface 2202 will transmit to 

16 input/output buffer node 2201(0) the first item of data, to input/output buffer node 2201(1) the 

17 second item of data, to input/output buffer node 2201(2) the third item of data, and so forth, where 

18 each "item of data" refers to the amount of data which it receives from the input/output device to be 

19 transmitted in an input/output message packet After the device interface 2202 transmits an item of 

20 data to the last input/output buffer node to be participating in the input/output operation, here 

21 input/output buffer node 2201(7), it transmits the next item of data to input/output buffer node 

22 2201(0), thereby ensuring that data is transmitted to the input/output buffer nodes in round-robin 

23 fashion. 

24 Hie items of data transmitted to the input/output buffer nodes 2201(f) are arranged by the 

25 input/output device and device interface 2202 so that they will be directed to the processing elements 

26 11(f) of increasing values of index "i," also on a round-robin basis with respect to the index of the 

27 processing element reference numeral. However, a selected number of sequential items of data 

28 directed to the input/output buffers 2201(f) may be intended for the same processing element, which 

29 number is termed herein a "striping factor," and which is generally identified as "G" 

30 In addition, the items of data sequentially received by an input/output buffer node 2201(0) 

31 are stored at locations having successive offsets in the buffers of respective buffer memories 2223(i). 

32 In both Figs. 3A and 3B, the base of a buffer, that is, the location with a zero offset is depicted at the 

33 uppermost location in the respective buffer, and successive offsets are represented by the 

34 successively descending positions in the buffer. 

35 Thus, for example, using the example depicted in Figs. 3A and 3B of seven input/output 

36 buffer nodes 2201(0) through 2201(6), six processing elements 11(0) through 11(5), and a striping 

37 factor of three, the data items for the first three messages for processing element 11(0) are 
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1 transferred from the device interface 2202 to input/output buffer nodes 2201(0) through 2201(2) and 

2 are represented in the respective buffer memories 2223(0) through 2223(2) as TE(0) MSG(0)" 

3 through TE(0) MSG(2). B The device interface 2202 next transmits the data items for the first three 

4 messages for processing element 11(1) to input/output buffer nodes 2201(3) through 2201(5) and 

5 are represented in the respective buffer memories 2223(3) through 2223(5) as TE(1) MSG(0)" 

6 through TE(1) MSG(2)." Thereafter, the device interface 2202 transmits the data item for the first 

7 message for processing element 1 1(2) to input/output buffer node 2201(6), and the data items for the 

8 second and third messages for the same processing element 11(2) to input/output buffer node 

9 2201(0) and 2201(1). These data items are represented in the respective buffer memories 2223(6), 

10 2223(0) and 2223(1) by the legends TE(2) MSG(0)\ TE(2) MSG(l)" and TE(2) MSG(3)," 

11 respectively. The device interface transmits the successive items of data to the input/output buffer 

12 nodes 2201 in the same way. 

13 In the following data items wfll be generally identified TE(x) MSG(y)," where V identifies 

14 the processing element and y identifies the offset With reference to Fig. 3A, it can be observed 

15 that the Grst data item PE(0) MSG(0) of the Grst series of data items provided by the input/output 

16 device to be transmitted to processing element 11(0) is in the buffer of buffer memory 2223(0) of 

17 input/output buffer node 2201(0) at offset zero. The last data item PE(5) MSG(2) of the first series 

18 of data items to be transmitted to the last processing element 11(5) is in the buffer of buffer memory 

19 2223(3) of input/output buffer node 2201(3) at offset 2. This set of buffer locations across the 

20 buffers of the group of input/output buffer nodes 2201(0) through 2201(6) that are participating in 

21 an input/output operation will be termed a "frame" 

22 More generalfy, a frame is a set of buffer locations, across the buffers of the input/output 

23 buffer nodes 2201(i) participating in an input/output operation, extending from the first data item 

24 PE(x) MSG(y) in a series to be transmitted as a stripe to the Grst processing element 1 1(0) to the last 

25 data item PE(x) MSG(y) in the corresponding series to be transmitted as the same stripe to the last 

26 processing element 11(5). Each of the sequence of frames in the buffer memories 2223® will be 

27 identified by a frame identifier value. That is, the frame containing locations from offset zero of the 

28 buffer of buffer memory 2223(0), which contains data item PE(0) MSG(0), to offset two of the buffer 

29 of buffer memory 2223(3), which contains data item PE(5) MSG(2), wfll be identified as frame zero. 

30 Siimlarfy, the frame containing locations from offset two of the buffer of buffer memory 2223(4), 

31 which contains data item PE(0) MSG(3) to the offset of the buffer memory which contains data item 

32 PE(5) MSG(5) (not shown) will be identified as frame one, and so forth. 

33 The series of data items PE(x) MSG(y) in a frame that are to be transferred to a particular 

34 processing element 11(0 or processor 12 will be termed a "stripe." Each of the sequence of 

35 stripes in the buffer memories will be identified by a stripe offset value, which identifies the offset of 

36 the stripe from the beginning of a frame. That is, in the first frame, the data items in the first stripe, 

37 that is, the stripe at offset zero and containing data items PE(0) MSG(0) through PE(0) MSG(2), are 
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1 to be transferred to the first processing element 11(0) in the series participating in the input/output 

2 operation. Similarly, data items in the second stripe, that is, the stripe at offset one and containing 

3 data items PE(1) MSG(0) through PE(1) MSG(2) are to be transferred in input/output message 

4 packets 2230 to the second processing element 11(1) in the series participating in the input/output 

5 operation, and so forth. 

6 Each buffer location in the frame will also be termed a slot and will be identified by a slot 

7 offset value identifying the offset of the particular slot from the beginning of a frame. Thus, the 

8 location of offset zero of the buffer of buffer memory 2223(0) has a slot offset value of zero, the 

9 location of offset zero of the buffer of buffer memory 2223(1) has a slot offset value of one, and so 

10 forth. TTie location of offset two of the buffer of buffer memory 2223(3), which contains data item 

11 PE(5) MSG(2), has a slot ofiset value of fourteen. Similarly, the location of offset two of the buffer 

12 of buffer memory 2223(4), which contains data item PE(0) MSG(3), which is the first slot of the 

13 second frame, has a slot offset value of zero. It will be appreciated that the number of slots, and thus 

14 the number of data items PE(x) MSG(y), in a frame, corresponds to the number of processing 

15 elements NPE times the striping factor C 

16 As also described above, the input/output buffer nodes 2201(0 transmit the successive data 

17 items PE(x) MSG(y) in their respective buffer memories to the processing elements ll(i), as 

18 represented on Fig. 3B. As shown on Fig. 3B, each processing element receives the messages 

19 containing the data items for its index V in the data item identification PE(x) MSG(y), and stores 

20 them in successive offsets y Thus, it will be appreciated that the indices V and y in the data item 

21 identification PE(x) MSG(y) reference the processing element identification and the offset, 

22 respectfully. 

23 It will further be appreciated that complementary operations will occur in an input/output 

24 operation in the reverse direction to transfer data items from the successive buffer offsets of the 

25 processing elements ll(i), through the buffer memories 2223 of the input/output buffer nodes and to 

26 the input/output device. In that case, however, the processing element 11(0) will transmit the first 

27 three data items PE(0) MSG(0), PE(0) MSG(1), and PE(0) MSG(2) in its buffer to the input/output 

28 buffer nodes 2201(0) through 2201(2), and so forth. Thus, the input/output buffer node 

29 identifications used in the address portions 31 of the input/output message packets will be related to 

30 the index y of the data item identification PE(x) MSG(y), and the buffer offset will be related to the 

31 index V 

32 The parallel send address/offset generator 2278(i) in each input/output buffer node 2201(i) 

33 generates, for each input/output message packet, information providing the processing element 

34 identification V in particular, the address of the processing element, which the network interface 

35 2277(i) uses to generate the information for the message address portion 31 of the input/output 

36 message packet 2230. In addition, the parallel send address/offset generator 2278(i) generates the 

37 offset y for the data item PE(x) MSG(y). In this operation, the parallel send address/offset 
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1 generator 2278(f) operates using several items of information, including: 

2 (a) the number of input/output buffer nodes "N" participating in the input/output operation, 

3 (b) the striping factor "C," 

4 (c) the number of processing elements "NPE" participating in the input/output operation, 

5 (d) the index T of the input/output buffer node 2201(i), and 

6 (e) the address of the first processing element 11(0) in the partition participating in the 

7 input/output operation, relative to the base of the processing element of the system. 

8 These items of information may be provided when initiating the input/output operation. 

9 From these items of information, the parallel send address/offset generator 2278 may 

10 determine the following initial values used in connection with generating the first input/output 

1 1 message packet 2230 in the input/output operation: 

12 (a) an initial destination processing element address value, 

13 (b) an initial destination offset value, comprising (i) an initial offset base value and (n) an 

14 initial offset delta value, both of which the parallel send address/offset generator 2278 will use to 

15 determine an initial destination processing element buffer offset value, and 

16 (c) an initial slot value, 

17 and the following incrementation values used in connection with generating subsequent input/output 

18 message packets 2230, if any, in the input/output operation: 

19 (d) a destination processing element address incrementation value, 

20 (e) offset incrementation values, including (i) an offset base incrementation value and (ii) an 

21 offset delta incrementation value, and 

22 (f) a slot incrementation value. 

23 It wul be appreciated that these values may alternatively be provided when initiating the input/output 

24 operation, 

25 A parallel send address/offset generator 2278(i), a detailed block diagram of which is 

26 depicted in Figs, 4A and 4B, includes four general sections, namely, a destination processing element 

27 address generating section 2310, an offset delta generating section 2311, an offset base generating 

28 section 2312, and a slot count section 2313. The offset base generating section 2312 and offset delta 

29 generating section 2311 generate, respectively, OFFSET BASE and OFFSET DELTA signals which 

30 axe coupled to an adder 23 14. Tie adder 23 14, in turn, generates DEST OFFSET destination offset 

31 signals representing a value corresponding to the arithmetic sum of the values represented by the 

32 OFFSET BASE and OFFSET DELTA signals, which are latched in a latch 2315. The parallel send 

33 address/offset generator 2278(i) also couples the DEST OFFSET signals over bus 2287, to be used by 
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1 the network interface 2277 in generating the destination buffer offset portion of an input/output 

2 message packet 

3 A destination offset value is essentially formed from two components, one relating to the 

4 frame identifier of the frame containing the data item PE(x) MSG(y) being transmitted by the 

5 input/output buffer node, and the other relating to the offset of the slot containing the data item 

6 PE(x) MSG(y) in the series within the frame that is to be transferred to the same processing element 

7 11(f) or scalar processor 12. In particular, the binary-encoded value represented by the OFFSET 

8 BASE signals, generated by the offset base generating section 2312, represents the component 

9 relating to the frame identifier value. Similarly, the binary-encoded value represented by the 

10 OFFSET DELTA signals represents the component relating to the position of the slot containing the 

11 data item in the sequence within a stripe. 

12 The offset base generating section uses the initial offset base value and the offset base 

13 incrementation value, as well as the striping factor "C and a BUMP OFFSET BASE signal from the 

14 slot count section 2313 in generating the OFFSET BASE signal. The initial offset base value for a 

15 parallel send address/offset generator 2278(f) relates to the frame of the first data item to be 

16 transmitted during the input/output operation. The frame identifier value of the data item PE(x) 

17 MSG(y) corresponds to the greatest integer in the quotient of (a) the input/output buffer node's 

18 index "f, divided by (b) the number of data items in a frame, which corresponds to the striping factor 

19 "C times the number "NPE" of processing elements 11(f) participating in the input/output operation. 

20 The frame identifier, in turn, is multiplied by the striping factor "G," since for each subsequent frame 

21 the base offset value for the first data item PE(x) MSG(y) in each stripe corresponds to this value. 

22 The offset base incrementation value is related to the number of frames that the 

23 input/output buffer node will increment between transmission of input/output message packets 2230. 

24 It will be appreciated that the number of frames will correspond to the greatest integer in the 

25 quotient of (a) the number "N" of input/output buffer nodes 2201(i) participating in the input/output 

26 operation, divided by (b) the number of slots in a frame, that is, the striping factor "C times the 

27 number "NPE* of processing elements ll(i) participating in the input/output operation. This value is 

28 also multiplied by the striping factor "C," since the base for each subsequent frame will begin with a 

29 value corresponding to the frame identifier times the striping factor. 

30 It will be appreciated that, if the number "N n of input/output buffer nodes 2201 (i) 

31 participating in the input/output operation is not a multiple of the number of slots in a frame, the 

32 offset of the slot containing the data item PE(x) MSG(y) being transmitted will change for each 

33 subsequent input/output message packet. The change in the slot offset corresponds to the remainder 

34 of the quotient of (a) the number "N" of input/output buffer node 220 l(i) participating in the 

35 input/output operation, divided by (b) the number of slots in a frame, that is, the striping factor "C 

36 times the number "NPE" of processing elements ll(i) participating in the input/output operation, 

37 which remainder, in turn, corresponds to the number "N" modulo the number of slots in a frame* As 
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1 a result of this change in slot offset, the offset base generating section 2312 further increments the 

2 base offset value when the change of the offset of the slot from one input/output message packet 

3 2230 to the next would extend beyond the number of slots in a frame. The slot count section 2313 

4 generates the BUMP OFFSET BASE signal when this condition occurs. 

5 The slot count section 2313 maintains a running index of the slot in the frame of the data 

6 item PE(x) MSG(y) for which the parallel send address/offset generator 2278(i) is currently 

7 generating DEST PE ADRS and DEST OFFSET signals. The slot count section 2313 uses the initial 

8 slot value and the slot incrementation value, as well as a correction value corresponding to the 

9 number of slots in a frame, to maintain the running index of the slot in the frame of the data item 

10 PE(x) MSG(y) for which the parallel send address/offset generator 2278(i) is currentty generating 

11 DEST PE ADRS and DEST OFFSET signals. The initial slot value corresponds to the value of the 

12 index T of the parallel send address/offset generator 2278(i), modulo the number of slots in a frame. 

13 The slot incrementation value is, as noted above, the number "N" of input/output buffer modules 

14 2201(f), modulo the number of slots in a frame. When the slot count section 2313 generates a slot 

15 count value that exceeds the number of slots in a frame, it asserts the BUMP OFFSET BASE signal 

16 and reduces the slot count value by the number of slots in a frame. The result is the offset of the slot 

17 in the next frame. 

18 The destination processing element address generating section 2312 uses (i) the initial 

19 destination processing element address value, (ii) the destination processing element address 

20 incrementation value, (iii) the number of processing elements "NPE" participating in the 

21 input/output operation, (iv) the address of the first processing element 11(0) in the partition 

22 participating in the input/output operation, relative to the base of the processing element of the 

23 system, and (v) a BUMP DEST ADRS bump destination address signal from the offset delta 

24 generating section in generating DEST PE ADRS destination processing element address signals. 

25 The parallel send address/offset generator 2278(i) couples the DEST PE ADRS signals to the 

26 network interface 2277, which uses them in generating the message address portion 31 of the 

27 input/output message packet 2230. 

28 It wiD be appreciated that, for the sequence of stripes in a frame, all of the data items PE(x) 

29 MSG(y) in slots in a stripe are to be transmitted in input/output message packets 2230 to one 

30 processing element ll(i) or scalar processor 12 participating in the input/output operation. The 

31 initial destination processing element address value for each parallel send address/offset generator 

32 2278(i) thus relates to the stripe offset value for the stripe within the frame containing the first data 

33 item PE(x) MSG(y) to be transmitted by the input/output buffer node 2201(Q. The stripe offset 

34 value, in turn, corresponds to the greatest integer of the quotient of the input/output buffer node's 

35 index T divided by the striping factor "C," modulo the number of stripes in a frame. The number of 

36 stripes in a frame corresponds to "NPE," the number of processing elements ll(i) and scalar 

37 processors 12 participating in the input/output operation. 
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1 The stripe offset value so generated is actually the offset, from the first processing element 

2 11(0) or scalar processor 12 in the partition participating in the input/output operation, for the first 

3 input/output message packet 2230 to be generated by the input/output buffer node. Accordingly, the 

4 initial destination processing element address value is this stripe offset value plus the address of the 

5 first processing element 11(0) or scalar processor 12 participating in the input/output operation 

6 relative to the base of the processing element of the system 10. 

7 The destination processing element address incrementation value is used by a parallel send 

8 address/offset generator 2278(i) when generating a destination processing element address for each 

9 subsequent input/output message packet 2230 generated by its input/output message buffer 2201(1). 

10 The destination processing element address incrementation value is related to the number of stripes 

11 within a frame that the input/output buffer node 2201 (i) will increment between transmission of 

12 input/output message packets 2230. Thus, the destination processing element address 

13 incrementation value corresponds to the sum of the greatest integer of the number "N" of 

14 input/output buffer nodes 2201(i) participating in the input/output operation divided by the striping 

15 factor "C," modulo the number of stripes in a frame, that is, "NPE." 

16 It will be appreciated that, if the number "NT of input/output buffer nodes 2201 (i) 

17 participating in the input/output operation is not a multiple of the number of stripes in a frame, the 

18 offset of the slot containing the data item P£(x) MSG(y) being transmitted within a stripe will change 

19 for each subsequent input/output message packet The change in the slot offset corresponds to the 

20 remainder of .the quotient of (a) the number "N" of input/output buffer nodes 2201 (i) participating in 

21 the input/output operation, divided by (b) the number of slots in a stripe, that is, the striping factor 

22 "C, which remainder, in turn, corresponds to the number "N" modulo the striping factor. As a result 

23 of this change in slot offset within a stripe, destination processing element address generating section 

24 2310 further increments the destination processing element address when the change of the offset of 

25 the slot from one input/output message packet 2230 to the next would extend beyond the number of 

26 slots in a stripe. The offset delta generating section 2311 generates the BUMP DEST ADRS signal 

27 when this condition occurs. 

28 The offset delta generating section 2311 also generates the OFFSET DELTA signal, which, 

29 as noted above, represents the component of the DEST OFFSET signal whose binary-encoded value 

30 identifies the position of the slot of the data item PE(x) MSG(y) being transmitted within a stripe, 

31 that is, within the series of data items within frame that are to be transmitted to the same processing 

32 element ll(i) or scalar processor 12. In addition, the offset delta generating section 2311 generates 

33 the BUMP DEST ADRS bump destination address signal which is directed to the destination 

34 processing element address generating section 2310. 

35 The initial offset delta value for a parallel send address/offset generator 2278(i) corresponds 

36 to the offset of the slot containing the first data item PE(x) MSG(y) to be transmitted by the parallel 

37 send address/offset generator 2278 (i) within the stripe. Thus, the initial offset delta value 
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1 corresponds to the remainder in the quotient of (a) the index T of input/output buffer node 22010, 

2 divided by (b) the number of slots in a frame, that is, the striping factor "C times the number "NPE" 

3 of processing elements ll(i) participating in the input/output operation. Otherwise stated, the initial 

4 offset delta value corresponds to the input/output buffer node's index "i," modulo the striping factor 

5 "C. 

6 The offset delta incrementation value is related to the number of slots within a stripe that 

7 the input/output buffer node 2201(i) will increment between transmission of input/output message 

8 packets 2230. As noted above, the number of stripes that the input/output buffer node 2201(i) will 

9 increment between transmission of input/output message packets 2230 is related to the change, if 

10 any, of the destination processing element address value as determined by the destination processing 

11 element address generating section 2310. Thus, the offset delta incrementation value is the 

12 remainder in the quotient of (a) the number "N" of input/output buffer nodes 2201(i) participating in 

13 the input/output operation, divided by (b) the number of slots in a stripe, that is, the striping factor 

14 B C" Otherwise stated, the offset delta incrementation value corresponds to the number "N" of 

15 input/output buffer nodes 2201(i) participating in the input/output operation, modulo the striping 

16 factor "C 

17 It will be appreciated that, if the incrementation of the offset delta value by the offset delta 

18 generating section 2311 from one input/output message packet 2230 to the next would result in an 

19 offset delta value greater than or equal to the striping factor "C" the offset delta value would actually 

20 relate to a slot in a stripe advanced beyond the stripe which is identified by the destination processing 

21 element address value as determined by the destination processing element address generating 

22 section 2310. This advanced stripe, in turn, includes slots whose data items PE(x) MSG(y) are to be 

23 transmitted to the next processing element 11(f) beyond that identified by the destination processing 

24 element address value. When that occurs, the offset delta generating section 2311 asserts the BUMP 

25 DEST ADRS bump destination address signal, to enable the destination processing element address 

26 generating section 2310 to further increment the destination processing element address. In 

27 addition, the offset delta generating section 2311 subtracts the striping factor from the incremented 

28 offset delta value, to point to the position of the slot, within the stripe associated with the destination 

29 processing element address generated by the destination processing element address generating 

30 section 23 10 for the data item being transmitted, of the data item PE(x) MSG(y) being transmitted in 

31 the input/output message packet . 

32 Similarly, at some point the destination processing element address generating section 2310 

33 will increment the destination processing element address to be above the address of the highest- 

34 indexed processing element ll(i) or scalar processor 12 participating in the input/output operation. 

35 At that point, the destination processing element address generating section 2310 corrects the 

36 destination processing element address to a value which is the address of one of the processing 

37 elements or scalar processors participating in the transfer. In this operation, the destination 
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1 processing element address generating section 2310 reduces the incremented destination processing 

2 element address by an amount corresponding to NPE, the number of processing elements 

3 participating in the input/output operation. This will ensure that the destination processing element 

4 address points to a processing element or scalar processor participating in the input/output operation 

5 during the operation. 

6 With this background, the structure and operation of parallel send address/offset generator 

7 2278(0 wiU be described in connection with Figs. 4A and 4B. Initially, the initial destination 

8 processing element address, which is represented by block 2320, is coupled through multiplexer 2340 

9 and latched in latch 2341. In addition, the destination processing element address increment value is 

10 stored in latch 2321 of the destination processing element address generating section 2310. 

11 Similarly, the initial offset delta value and initial offset base value, which are represented by 

12 blocks 2322 and 2324, respectfully, are coupled through multiplexers 2342 and 2344, respectfully, as 

13 OFFSET DELTA and OFFSET BASE signals, respectfully. These signals are latched in latches 2343 

14 and 2345, respectfully. They are also concurrently coupled to an adder 2314, which generates an 

15 OFF PASE + DEL offset base plus delta signal whose binary-encoded value represents the sum of 

16 binary-encoded value of the OFFSET DELTA and OFFSET BASE signals. The OFF BASE + DEL 

17 signal is latched in a latch 2315, which provides the DEST OFFSET destination offset signal. 

18 Contemporaneously, the offset delta increment value and offset base increment value are 

19 stored in registers 2323 and 2325, respectively, of the offset delta generating section 2311 and offset 

20 base generating section 2312. The initial slot value, represented by block 2326, is coupled through 

21 multiplexer 2346 and stored in latch 2347, and the slot increment value is stored in register 2327 of 

22 the slot count section 2313. 

23 In addition, various other values are stored in other registers. The destination processing 

24 element address generating section 2310, includes registers 2330 and 2331. As noted above, when 

25 incrementing to generate the destination processing element address values, at some point the 

26 incrementation may generate a value which represents a processing element address beyond the 

27 range of processing elements ll(i) or scalar processors 12 participating in the input/output 

28 operation. The value in register 2330 is used to assist in detecting such a condition. 

29 As will be described below in connection with Fig. 4A, when incrementing the destination 

30 processing element address value, the destination processing element address generating section 

31 2310 selects between the values in registers 2321 and 2331, depending on the relationship between 

32 the previously-determined destination processing element address value and the contents of register 

33 2330. The value in register 2330 is used to determine when the destination processing element 

34 address value has been incremented to a point at which it would, when next incremented, identify a, 

35 processing element ll(i) or scalar processor 12 beyond those participating in the input/output 

36 operation. Such a value corresponds to (a) the address of the last processing element ll(i) or scalar 



WO 93/04438 



PCT/US92/06848 



-16- 

1 processor 12 participating in the input/output operation, which is the address of the first processing 

2 element 11(0) or scalar processor 12 plus the number "NPE" of processing elements or scalar 

3 processors participating in the input/output operation, less (b) the amount by which it would be 

4 incremented, that is, the address increment value. If the destination processing element address 

5 generating section 2310 determines that the previously-determined destination processing element 

6 address value is less than the value stored in register 2330, the destination processing element 

7 address value, if incremented by the address increment value in register 2321, would remain, in its 

8 permissible range. In that case, the destination processing element address ^nerating section 2310 

9 uses the value in register 2321 in the incrementation. 

10 However, if the destination processing element address generating section 2310 determines 

11 that the previously-determined destination processing element address value is greater than or equal 

12 to the value in register 2330, if the destination processing element address value were incremented 

13 by the address increment value, it would be beyond its permissible ran^. In that case, as noted 

14 above, the incremented destination processing element address value is reduced by a value 

15 corresponding to the number "NPE" of processing elements and scalar processors participating in the 

16 input/output operation. Hie contents of register 2331 corresponds to the address increment value, 

17 reduced by the value "NPE." When this value is added to the previously-determined destination 

18 processing element address value, the result would be equivalent to reducing the incremented 

19 destination processing element address value by the value •NPE.'' 

20 Similariy, the offset delta generating section 2311 includes two registers 2332 and 2333. As 

21 noted above, the offset delta value varies over a range relating to the striping factor, and the values 

22 in these registers are used to limit the offset delta value to that range. As will be described below in 

23 connection with Fig. 4A, when incrementing the offset delta value, the offset delta generating section 

24 2311 selects between the values in registers 2323 and 2333, depending on the relationship between 

25 the prevtousry-detennined offset delta value and the contents of register 2332. Tne value in register 

26 2332 is used to determine when the offset delta value has been incremented to a point at which it 

27 would, when next incremented, represent an offset delta value beyond its permissible range, that is, 

28 equal to or greater than the striping factor "C. Such a value corresponds to (a) the striping factor 

29 "C, less (b) the amount by which it would be incremented, that is, the offset delta increment value. 

30 If the offset delta generating section 2311 determines that the previousty-detennined offset delta 

31 value is less than the value stored in register 2332, the offset delta value, if incremented by the offset 

32 delta increment value in register 2323, would remain in its permissible range. In that case, the offset 

33 delta generating section 23 1 1 uses the value in register 2323 in the incrementation. 

34 However, if the offset delta generating section 2311 determines that the previously- 

35 determined offset delta value is greater than or equal to the value in register 2332, if the delta offset 

36 value were incremented by the delta increment value, it would be beyond its permissible range. la 

37 that case, as noted above, the incremented delta offset value is reduced by the striping factor "C and 
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1 the BUMP DEST ADRS signal asserted to control the destination processing element address 

2 generating section 2310. The contents of register 2333 corresponds to the delta increment value, 

3 reduced by the striping factor "G" When this value is added to the previously-determined delta 

4 offset value, the result would be equivalent to reducing the incremented delta offset value by the 

5 striping factor "G" 

6 The offset base generating section 2312 also has a register 2334 which stores a value 

7 corresponding to the offset base increment value plus the striping factor "C. tt The value in the 

8 register 2325 is used when the slot count section 2313 determines that the previously-incremented 

9 offset base value is to be incremented by the offset base increment value. On the other hand, the 

10 value in the register 2334 is used in the incrementation of the offset base value, which, as described 

11 above, is further incremented by an amount corresponding to the striping factor "C." 

12 Finally, the slot count section 2313 includes two registers 2335 and 2336. Register 2335 

13 stores a value which is used to determine when the slot index value has been incremented to a point 

14 at which it would, when next incremented, represent a slot index value beyond its permissible range, 

15 that is, equal to or greater than the number of slots in a frame, the striping factor *C times the 

16 number "NPE" of processing elements ll(i) or scalar processors 12 participating in an input/output 

17 operation. The value in register 2335 is the striping factor "C times the number "NPE," less the slot 

18 increment value. The value in register 2336 is the slot increment value less the number of slots in a 

19 frame. 

20 As will be described below in connection with Fig. 4B, when incrementing the slot count 

21 value, the slot count section 2313 selects between the values in registers 2327 and 2336, depending on 

22 the relationship between the previousfy-determined slot count value and the contents of register 

23 2335. The value in register 2335 is used to determine when the slot count value has been 

24 incremented to a point at which it would, when next incremented, identify a slot offset greater than 

25 the number of slots in a frame. Such a value corresponds to (a) the number of slots in a frame, which 

26 is the striping factor "C times the number "NPE" of processing elements ll(i) and scalar processors 

27 12 participating in the input/output operation, less (b) the slot increment value. If the slot count 

28 section 2313 determines that the previously-determined slot increment value is less than the value 

29 stored in register 2335, the slot increment value, if incremented by the slot increment value in 

30 register 2327, would remain in its permissible range. In that case, the slot count section 2313 uses the 

31 value in register 2327 in the incrementation. 

32 However, if the slot count section 2313 determines that the previously-determined slot count 

33 value is greater than or equal to the value in register 2335, if the slot count value were incremented 

34 by the slot increment value, it would identify a slot beyond the end of the current frame. In that case, 

35 as noted above, the slot count section 2313 asserts the BUMP OFFSET BASE signal, to enable the 

36 offset base section 2312 to use the value in register 2334 in the incrementation of the offset base 

37 value. In addition, the slot count section 2313 generates an new slot count value whose value is 
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1 incremented by the slot increment value and reduced by a value corresponding to the number of slots 

2 in a frame. The contents of register 2331 corresponds to the slot increment value, reduced by the 

3 value corresponding to the number of slots in a frame. When this value is added to the previously- 

4 determined slot count value, the result would be equivalent to reducing the incremented slot count 

5 value by the value corresponding to the number of slots in a frame. 

6 After the various registers have been loaded as described above, and enabled the initial 

7 values to be loaded into latches 2341, 2343, 2315, 2345 and 2347 for the initial input/output message 

8 packet 2230 to be generated by the input/output buffer node, the various sections 2310, 2311, 2312 

9 and 2313 are enabled to concurrently perform a series of iterations to facilitate the generation of 

10 DEST PE ADRS signals and DEST OFFSET signals for use in connection with generation of 

11 input/output message packets 2230 for the subsequent data items PE(x) MSG(y) to be transmitted by 

12 the input/output buffer node. 

13 With reference initially to Fig. 4A, in the offset delta generating section 2311, the LAT 

14 OFFSET DELTA latched offset delta signals from the latch 2343, which at this point have a binary- 

15 encoded value corresponding to the initial offset delt* value, are coupled to one input terminal of an 

16 adder 2351. A second input terminal of adder 2351 receives a SEL OFFSET DELTA INC FACTOR 

17 selected offset delta increment factor signal from a multiplexer 2351. The adder 2350 generates INC 

18 OFF DEL incremented offset delta signals which are coupled as the OFFSET DELTA signal to the 

19 input terminal of latch 2343 and to one input terminal of adder 2314, which, in combination with the 

20 OFFSET BASE signal generated during the iteration by the offset base generating section 2312 as 

21 described below, will generate the DEST OFFSET destination offset signal. The INC OFF DEL 

22 signal from adder 2350 represents the incremented delta offset value for the iteration. 

23 The SEL OFFSET DELTA INC FACTOR selected offset delta increment factor signal is 

24 provided by multiplexer 2351 under control of a comparator 2352. The comparator 2352, in turn, 

25 also receives the LAT OFFSET DELTA signal from latch 2343, as well as the signal from register 

26 2332, and generates in response the BUMP DEST ADRS bump destination address signal. The 

27 comparator 2352 negates the BUMP DEST ADRS signal if it determines that the binary-encoded 

28 value of the LAT OFFSET DELTA signal is less than the value represented by the signal from the 

29 register 2332. When that occurs, the binary-encoded value of the LAT OFFSET DELTA signal, if 

30 incremented by adder 2350 by the offset delta increment value in register 1323, will remain within the 

31 permissible range of the offset delta value. Accordingjy, the negated BUMP DEST ADRS signal 

32 enables the multiplexer to couple the signal from register 2323 as the SEL OFF DELTA INC 

33 FACTOR selected offset delta increment factor signal to adder 2350. The adder generates an INC 

34 OFF DEL incremented offset delta signal, which the multiplexer 2342 couples as the OFFSET 

35 DELTA signal to input terminals of latch 2343 and of adder 23 14. 

36 On the other hand, the comparator 2343 asserts the BUMP DEST ADRS signal if it 

37 determines that the binary-encoded value of the LAT OFFSET DELTA signal is greater than or 
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1 equal to the value represented by the signal from the register 2332. When that occurs, the binary- 

2 encoded value of the LAT OFFSET DELTA signal, if incremented by adder 2350 by the offset delta 

3 increment value in register 2323, will be beyond permissible range of the offset delta value. 

4 Accordingly, the asserted BUMP DEST ADRS signal enables the multiplexer to couple the signal 

5 from register 2333 as the SEL OFF DELTA INC FACTOR selected offset delta increment factor 

6 signal to adder 2350. Since, as noted above, the binary-encoded value of the signal from register 

7 2333 corresponds to the delta increment value, reduced by the striping factor "C, when the adder 

8 generates an INC OFF DEL incremented offset delta signal, the binary-encoded value of the INC 

9 OFF DEL signal will be within the required range. The multiplexer 2342 couples the INC OFF DEL 

10 signal as the OFFSET DELTA signal to input terminals of latch 2343 and of adder 2314. 

11 The destination processing element address generating section 2310 operates in a manner 

12 generally similar to the operation of the slot count section 2313. In destination processing element 

13 address generating section 2310, destination processing element address signals from the latch 2343, 

14 which at this point have a binary-encoded value corresponding to the initial destination processing 

15 element address value, are coupled to one input terminal of an adder 2352. A second input terminal 

16 of adder 2352 receives a SEL PE ADRS INCR FACTOR selected processing element address 

17 increment factor signal from a multiplexer 2353. Adder 2352 further has a carry input terminal "G" 

18 that is controlled by the BUMP DEST ADRS bump destination address signal. The adder 2352 

19 generates an INC PE ADRS incremented processing element address signal which is coupled as to 

20 the input terminal of latch 2341. The INC PE ADRS signal from adder 2352 represents the 

21 incremented destination processing element address value for the iteration. 

22 The SEL PE ADRS INCR FACTOR selected processing element address increment factor 

23 signal is provided by multiplexer 2353 under control of a comparator 2354 and multiplexer 2355. The 

24 comparator 2354, in turn, also receives the DEST PE ADRS destination processing element address 

25 signal from latch 2341, as well as the signal from register 2330. Comparator 2354 provides two 

26 output signals, including a RST IF GT reset if greater than signal and a RST IF GE reset if greater 

27 than or equal to signal. The comparator 2354 asserts the RST IF GT signal if the binary-encoded 

28 value of the DEST PE ADRS signal is greater than the binary-encoded value of the signal from 

29 register 2330. On the other hand, the comparator asserts the RST IF GE signal if the binary- 

30 encoded value of the DEST PE ADRS signal is greater than or equal to the binary-encoded value of 

31 the signal from register 2330. Thus, comparator 2354 asserts the RST IF GE signal, but not the RST 

32 IF GT signal, if the binary-encoded value of the DEST PE ADRS signal corresponds to the value 

33 stored in register 2330. 

34 The multiplexer 2355, under control of the BUMP DEST ADRS bump destination address 

35 signal, selectivery couples one of the RST IF GE or RST IF GT signals as a RST PE ADRS reset 

36 processing element address signal to control multiplexer 2353. If the offset delta generating section 

37 2311 is asserting the BUMP DEST ADRS signal, the multiplexer 2355 couples the RST IF GT reset 
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1 if greater than signal to the multiplexer 2353 as the RST PE ADRS reset processing element address 

2 signal. On the other hand, if the offset delta generating section 2311 is negating the BUMP DEST 

3 ADRS signal, the multiplexer 2355 couples the RST IF GE reset if greater than or equal to signal as 

4 the RST PE ADRS signal. 

5 The multiplexer 2355 ensures that, when the destination processing element address 

6 generating section 2310 uses the BUMP DEST ADRS bump destination address signal, which is 

7 coupled to the carry in input terminal Q of the adder 2352, to further increment the destination 

8 processing element address value, it does not increment the value beyond the permissible range of 

9 destination processing element address values. If the BUMP DEST ADRS signal is negated, so that 

10 the destination processing element address value will not be further incremented thereby, 

11 multiplexer 2355 couples the RST IF GT reset if greater than signal as an RST PE ADRS reset 

12 processing element address signal. Under this condition, if the comparator 2355 determines that the 

13 binary-encoded value of the DEST PE ADRS destination processing element address signal is less 

14 than or equal to the binary-encoded value of the signal from register 2330, the RST IF GT signal will 

15 be negated. The negated BUMP DEST ADRS signal w01 enable multiplexer 2355 to couple the 

16 negated RST IF GT signal to the multiplexer 2353, which, in turn, enables the multiplexer 2353 to 

17 couple an SEL PE ADRS INC FACTOR selected processing element address increment factor 

18 signal representing the address increment value to the second input terminal of adder 2352. Adder 

19 2352 generates an INC PE ADRS incremented processing element address signal representing the 

20 sum of the binary-encoded values of the DEST PE ADRS signal, the SEL PE ADRS INC FACTOR 

21 signal, which the multiplexer 2340 couples the INC PE ADRS signal to the input terminal of latch 

22 2341. 

23 If, however, while the BUMP DEST ADRS signal is negated the comparator 2355 

24 determines that binary-encoded value of the DEST PE ADRS signal is greater than the binary- 

25 encoded value of the signal from register 2330, the RST IF GT signal will be asserted. In that case, 

26 the RST PE ADRS signal will also be asserted, enabling the multiplexer 2353 to couple an SEL PE 

27 ADRS INC FACTOR selected processing element address increment factor signal corresponding to 

28 the address increment value reduced by the value "NFE," to the second input terminal of adder 2352. 

29 Adder 2352 generates an INC PE ADRS incremented processing element address signal 

30 representing the sum of the binary-encoded values of the DEST PE ADRS signal and the SEL PE 

31 ADRS INC FACTOR signaL The multiplexer 2340 couples the INC PE ADRS signal to the input 

32 terminal of latch 2341. 

33 if, on the other hand, the BUMP DEST ADRS signal is asserted, the adder 2352 will 

34 generate INC PE ADRS incremented processing element address signals whose binary-encoded 

35 value corresponds to the sum of the binary-encoded values of the DEST PE ADRS destination 

36 processing element address signals and the SEL PE ADRS INC FACTOR selected processing 

37 element address increment factor, as further incremented since the BUMP DEST ADRS signal is 
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1 asserted. In that case, to ensure that the adder 2352 does not increment the DEST PE ADRS signal 

2 to provide a destination processing element address beyond that for the processing elements ll(i) 

3 and scalar processors 12 participating in the input/output operation, the BUMP DEST ADRS signal 

4 enables the multiplexer 2355 to couple the RST IF GE reset if greater than or equal to signal as the 

5 RST PE ADRS signaL 

6 Accordingly, if the comparator 2355 determines that the binary-encoded value of the DEST 

7 PE ADRS destination processing element address signal is less than the binary-encoded value of the 

8 signal from register 2330, the RST IF GE signal will be negated. The asserted BUMP DEST ADRS 

9 signal will enable multiplexer 2355 to couple the negated RST IF GE signal to the multiplexer 2353, 

10 which, in turn, enables the multiplexer 1353 to couple an SEL PE ADRS INC FACTOR selected 

11 processing element address increment factor signal representing the address increment value to the 

12 second input terminal of adder 2352. Adder 2352 generates an INC PE ADRS incremented 

13 processing element address signal representing the sum of the binary-encoded values of the DEST 

14 PE ADRS signal the SEL PE ADRS INC FACTOR signal, along with the asserted BUMP DEST 

15 ADRS signal as applied to its carry in terminal Q, which the multiplexer 2340 couples the INC PE 

16 ADRS signal to the input terminal of latch 2341. 

17 If, however, while the BUMP DEST ADRS signal is asserted the comparator 2355 

18 determines that binary-encoded value of the DEST PE ADRS signal is greater than or equal to the 

19 binary-encoded value of the signal from register 2330, the RST IF GE signal will be asserted. In that 

20 case, the RST PE ADRS signal will also be asserted, enabling the multiplexer 2353 to couple an SEL 

21 PE ADRS INC FACTOR selected processing element address increment factor signal corresponding 

22 to the address increment value reduced by the value "NPE," to the second input terminal of adder 

23 2352. Adder 2352 generates an INC PE ADRS incremented processing element address signal 

24 representing the sum of the binary-encoded values of the DEST PE ADRS signal, the SEL PE 

25 ADRS INC FACTOR signal, along with the BUMP DEST ADRS signal at its carry-in input terminal 

26 Q. The multiplexer 2340 couples the INC PE ADRS signal to the input terminal of latch 2341. 

27 With reference to Fig. 4B, in the slot count section 2313, the LAT SLOT INDEX latched 

28 slot index signal from the latch 2347, which at this point have a binary-encoded value corresponding 

29 to the initial slot index value, are coupled to one input terminal of an adder 2360. A second input 

30 terminal of adder 2360 receives a SEL SLOT INDEX INC FACTOR selected slot index increment 

31 factor signal from a multiplexer 2361. The adder 2360 generates an INC SLOT INDEX incremented 

32 slot index signal which multiplexer 2346 couple as a SLOT INDEX signal to the input terminal of 

33 latch 2343. The SEL SLOT INDEX INC FACTOR selected slot index increment factor signal is 

34 provided by multiplexer 2361 under control of a comparator 2362. 

35 The comparator 2362, in turn, also receives the LAT SLOT INDEX signal from latch 2347, 

36 as well as the signal from register 2335, and generates in response the BUMP OFFSET BASE bump 

37 offset base signaL The comparator 2362 negates the BUMP OFFSET BASE signal if it determines 
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1 that the binary-encoded value of the LAT slot index signal is less than the value represented by the 

2 signal from the register 2335. When that occurs, the binary-encoded value of the LAT SLOT 

3 INDEX signal, if incremented by adder 2360 by the slot increment value in register 2327, will remain 

4 within the permissible range of the slot index value. Accordingly, the negated BUMP OFFSET 

5 BASE signal enables the multiplexer 2361 to couple the signal from register 2327 as the SEL SLOT 

6 INDEX INC FACTOR selected slot index increment factor signal to adder 2360. The adder 

7 generates an INC SLOT INDEX incremented slot index signal, which the multiplexer 2346 couples 

8 as the SLOT INDEX signal to input terminals of latch 2347. 

9 On the other hand, the comparator 2362 asserts the BUMP SLOT INDEX signal if it 

10 determines that the binary-encoded value of the LAT OFFSET DELTA signal is greater than or 

11 equal to the value represented by the signal from the register 2335. When that occurs, the binary- 

12 encoded value of the LAT SLOT INDEX signal, if incremented by adder 2360 by the increment slot 

13 index value in register 2327, will be beyond permissible range of the slot index value. Accordingry, 

14 the asserted BUMP OFFSET BASE signal enables the multiplexer 2361 to couple the signal from 

15 register 2336 as the SEL SLOT INDEX INC FACTOR selected slot index increment factor signal to 

16 adder 2360. Since, as noted above, the binary-encoded value of the signal from register 2336 

17 corresponds to the slot increment value, reduced by the number of slots in a frame, when the adder 

18 2360 generates an INC SLOT INDEX incremented slot index signal, the binary-encoded value of the 

19 INC slot signal will be within the required range. The multiplexer 2346 couples the INC SLOT 

20 INDEX signal as the SLOT INDEX signal to the input terminal of latch 2347. 

21 In the offset base generating section 2312, the LAT OFFSET BASE latched offset base 

22 signal from the latch 2345, which at this point has a binary-encoded value corresponding to the initial 

23 offset base value, is coupled to one input terminal of an adder 2363. A second input terminal of 

24 adder 2363 receives a SEL OFF BASE INC FACTOR selected offset base increment factor signal 

25 from a multiplexer 2364. The adder 2363 generates an INC OFF BASE incremented offset base 

26 signal which multiplexer 2344 couples as the offset base signal to the input terminal of latch 2345 and 

27 to one input terminal of adder 2314. As described above, adder 2314 generates an OFFSET BASE 

28 + DEL offset base plus delta signal, whose binary-encoded value corresponds to the sum of the 

29 binary-encoded values of the OFFSET BASE and OFFSET DELTA signals, and which is coupled to 

30 the input terminal of latch 23 15. 

31 The SEL OFF BASE INC FACTOR selected offset base increment factor signal is provided 

32 by multiplexer 2364 under control of the BUMP OFFSET BASE signal from comparator 2362. As 

33 described above, the comparator 2362 negates the BUMP OFFSET BASE signal if it determines that 

34 the binary-encoded value of the LAT slot index signal is less than the value represented by the signal 

35 from the register 2335. When that occurs, the binary-encoded value of the LAT SLOT INDEX 

36 signal, if incremented by adder 2360 by the slot increment value in register 2327, will remain within 

37 the r^nnissible range of the slot index value. In that case, the negated BUMP OFFSET BASE signal 
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1 enables the multiplexer 2364 to couple the signal from register 2325, representing the offset base ■ 

2 increment value, as the SEL OFF BASE INC FACTOR selected offset base increment factor signal 

3 to adder 2363. The adder 2363 generates an INC OFF BASE incremented offset base signal, which 

4 the multiplexer 2346 couples as the OFFSET BASE signal to input terminals of latch 2347 and adder 

5 2314. 

6 On the other hand, the comparator 2362 asserts the BUMP SLOT INDEX signal if it 

7 determines that the binary-encoded value of the LAT OFFSET DELTA signal is greater than or 

8 equal to the value represented by the signal from the register 2335. When that occurs, the binary- 

9 encoded value of the LAT SLOT INDEX signal, if incremented by adder 2363 by the increment slot 

10 index value in register 2327, will be beyond permissible range of the slot index value. Accordingly, 

11 the asserted BUMP OFFSET BASE signal enables the multiplexer 2364 to couple the signal from 

12 register 2334, representing the offset base increment value plus the striping factor "C as the SEL 

13 OFF BASE INC FACTOR selected offset base increment factor signal, to adder 2363. In that case, 

14 adder 2363 generates an INC OFF BASE incremented offset base signal whose a binary-encoded 

15 value corresponds to the binary-encoded value of the LAT OFFSET BASE signal, incremented by 

16 both the offset base increment value and the striping factor "G" 

17 As noted above, the various sections 2310, 2312, 2312 and 2313 of the parallel send 

18 address/offset generator 2278(i) iteratively perform these operations to generate the DEST PE 

19 ADRS destination processing element address signals and DEST OFFSET destination offset signals 

20 to be used in connection with generation of the input/output message packets 2230. During each 

21 iteration, the input/output message packet 2230 transmitted by the input/output buffer 2201 (i) 

22 includes one data item PE(x) MSG(y) from of its buffer memory 2223(f). After the input/output 

23 buffer 2201(1) has transmitted all of the data items PE(x) MSG(y) it may terminate the input/output 

24 operation* 

25 It will be appreciated that numerous modifications may be made the parallel send 

26 address/offset generator 2278(i) described above. For example, instead of providing separate adders 

27 and comparators for the various sections 2310, 2311, 2312 and 2313, the parallel send address/offset 

28 generator may have a single adder and comparator, which may be shared among the various sections. 

29 In such an embodiment, the adder and comparator would be used in separate phases, during each 

30 phase to generate signals representing the destination processing element address value, offset delta 

31 value, offset base value and slot index value. In that case, the adder and comparator would be used 

32 to generate the offset delta value before the destination processing element address value, since they 

33 will require the BUMP DEST ADRS signal to generate the destination processing element address 

34 value. In addition, the adder and comparator would be used to generate the slot index value before 

35 the destination base value, since they will require the BUMP OFFSET BASE signal to generate the 

36 offset base value. Such an embodiment may be useful in reducing the physical size of the circuit 

37 comprising the parallel send address/offset generator 2278(i), although it will be appreciated that it 
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1 may require more time to generate the destination processing element address value and destination 

2 offset value since they are determined in a four phase sequence. 

3 In addition, it will be appreciated that the destination processing element address value and 

4 destination offset value may be determined using a suitably-programmed microprocessor. 

5 The foregoing description has been limited to a specific embodiment of this invention. It will 

6 be apparent, however, that variations and modifications may be made to the invention, with the 

7 attainment of some or all of the advantages of the invention. Therefore, it is the object of the 

8 appended claims to cover all such variations and modifications as come within the true spirit and 

9 scope of the invention. 

10 What is claimed as new and desired to be secured by Letters Patent is: 
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1 CLAIMS 

2 1. A computer comprising a plurality of processing elements, and an input/output processor 

3 interconnected by a routing network, 

4 A. said routing network transferring messages between said processing elements and said 

5 input/output processor; 

6 B. said processing elements performing processing operations in connection with data 

7 received from said input/output processor in messages transferred over said routing network and 

8 transferring processed data to said input/output processor in messages over said routing network, 

9 said processing elements being connected as a first selected series of leaf nodes; 

10 C said input/output processor including a plurality of input/output buffers connected as a 

11 second selected series of leaf nodes of said routing network for generating messages for transfer over 

12 said routing network to a series of processing elements forming at least a selected subset of the 

13 processing elements during an input/output operation. 

14 2. A computer as defined in claim 1 in which said input/output processor further receives messages 

15 over said routing network from a series of processing elements forming at least a selected subset of 

16 the processing elements during an input/output operation. 

17 3. A computer as defined in claim 1 further comprising at least one control processor and a control 

18 network, said control processor generating processing control messages for transfer to said 

19 processing elements over said control network to control said processing elements. 

20 4. A computer as defined in claim 3 comprising a plurality of control processors each generating 

21 processing control messages for transfer to at least selected ones of said processing elements over 

22 said control network to control said processing elements, said control network being partitionable to 

23 define a plurality of partitions each facilitating the transfer of processing control messages between 

24 at least one control processor and selected ones of said processing elements. 

25 5. A computer as defined in claim 3 in which said control processor further generates input/output 

26 control messages and said input/output processor further includes a common control for receiving 

27 said input/output control messages and controlling said input/output buffers to perform input/output 

28 operations in response thereto. 

29 6. A computer as defined in claim 5 in which each processing element in a selected subset is 

30 identified by an address and includes a processing element receive buffer for buffering data from 

31 messages received from said routing network during an input/output operation, each processing 

32 element buffering data received in a message at an offset in said processing element buffer identified 

33 by a destination offset value in the message, each input/output buffer including: 

34 A. a transmit buffer for buffering a plurality of data items, each to be transmitted in a 

35 message to a processing element; and 

36 B. a destination processing element address and offset generator for iterative ry generating a 

37 destination processing element address value and a destination offset value. 
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1 7. A computer as defined in claim 6 in which the transmit buffer of each input/output buffer includes 

2 a plurality of storage locations at a series of source offsets, each storage location storing a data item 

3 used a message, the data items defining a succession of frames of storage locations in the transmit 

4 buffers of said input/output buffers, each frame being organized first in order of successive 

5 input/output buffers in storage locations having the same source offset and second in order of 

6 storage locations in each input/output buffer having successive source offsets so as to include data 

7 items to be received by the series of processing elements participating in the input/output operation 

8 for storage in their respective processing element receive buffers at the same destination offset value, 

9 the destination processing element address and offset generator iteratively generating destination 

10 processing element address values and destination offset values in response to the number of 

11 input/output buffers and the number of processing elements participating in the input/output 

12 operation. 

13 8. A computer as defined in claim 7 in which the destination processing element address and offset 

14 generator further generates during an initial iteration an initial destination processing element 

15 address value and an initial destination offset value both related to the number of input/output 

16 buffers, the number of processing elements participating in the input/output operation, and the 

17 position of the input/output buffer among the input/output buffers participating in the input/output . 

18 operation, the destination processing element address and offset generator during subsequent 

19 iterations generating a destination processing element address value and destination offset value in 

20 response to the initial destination processing element address value and an initial destination offset 

21 value 

22 9. A computer as defined in claim 7 in which said destination processing element address and offset 

23 generator further generates said destination processing element address value in response to a base 

24 processing element address value identifying a predetermined one of the processing elements in the 

25 series of processing elements participating in the input/output operation. 

26 10. A computer as defined in claim 7 in which said destination processing element address and offset 

27 generator comprises: 

28 A. a destination processing element address value generator for, during successive iterations, 

29 generating destination processing element address values in response to an initial destination 

30 processing element address value, the number of input/output buffers and the number of processing 

31 elements participating in the input/output operation said destination processing element address 

32 value generating during an iteration identifying the data item to be used in a message during the 

33 iteration within the sequence of data items comprising its frame; and 

34 B. a destination offset value generator for, during successive iterations, generating 

35 destination offset values in response to an initial destination offset value, the number of input/output 

36 buffers and the number of processing elements participating in the input/output operation, the 

37 destination offset value generated during an iteration identifying the frame containing the data item 

38 to be used in a message during the iteration wilhin the sequence of frames to be transferred. 
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1 11. A computer as defined in claim 10 in which said destination processing element address value 

2 generator includes: 

3 A. a destination processing element address value store for storing a destination processing 

4 element address value; 

5 B. an address incrementation value store for storing an address incrementation value; and 

6 C. a destination address value incrementation circuit for generating, during each iteration, 

7 an incremented destination processing element address value in response to the destination 

8 processing element address value stored in said destination processing element address value store 

9 and the address incrementation value, the incremented destination processing element address value 

10 being stored in the destination processing element address value store as the destination processing 

11 element address value for use during the next iteration. 

12 12. A computer as defined in claim 11 in which the address incrementation value stored in the 

13 address incrementation value store is related to the number of processing elements and the number 

14 of input/output buffers participating in the input/output operation. 

15 13. A computer as defined in claim 11 wherein said destination processing element address value 

16 generator further includes an destination address initialization circuit for enabling the destination 

17 processing element address value store to store an initial destination offset value both related to the 

18 number of input/output buffers, the number of processing elements participating in the input/output 

19 operation, and the position of the input/output buffer among the input/output buffers participating 

20 in the input/output operation. 

21 14. A computer as defined in claim 11 wherein said destination address value incrementation circuit 

22 further includes a destination processing element address value range limitation circuit for limiting 

23 incremented destination processing element address value to an address value range corresponding 

24 to the address values of the processing elements participating in the input/output operation. 

25 15. A computer as defined in claim 14 in which: 

26 A. said destination processing element address value incrementation circuit further includes: 

27 i. destination processing element address value range limitation store for storing a limitation 

28 value relating to an upper end of the address value range: and 

29 ii. an address reset store for storing an address reset value; 

30 B. said destination processing element address value range limitation circuit includes: 

31 i. a selector circuit for selectively coupling either the address incrementation value from said 

32 address incrementation value store or the address reset value from said address reset store to said 

33 destination address value incrementation circuit in response to a selection control signal; and 

34 ii. a comparator for generating said selection control signal in response to the destination 

35 processing element address value from said destination processing element address value store and 

36 the limitation value from said destination processing element address value range limitation store, 

37 the address reset value and the limitation value being selected to ensure that the incremented 
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1 destination processing element address value generated by said destination address value 

2 incrementation circuit is within said address value range. 

3 16. A computer as defined in claim 10 in which said destination offset value generator includes: 

4 A. a destination offset value store for storing a destination offset value; 

5 B. an offset incrementation value store for storing an offset incrementation value; and 

6 C a destination offset value incrementation circuit for generating, during each iteration, an 

7 incremented destination offset value in response to the destination offset value stored in said 

8 destination offset value store and the offset incrementation value, the incremented destination offset 

9 value being stored in the destination processing element offset value store as the destination offset 

10 value for use during the next iteration. 

11 17. A computer as defined in claim 16 in which the offset incrementation value stored in the offset 

12 incrementation value store is related to the number of processing elements and the number of 

13 input/output buffers participating in the input/output operation. 

14 18. A computer as defined in claim 16 wherein said destination offset value generator further 

15 includes an destination offset initialization circuit for enabling the destination offset value store to 

16 store an initial destination offset value related to the number of processing elements participating in 

17 the input/output operation and the position of the input/output buffer among the input/output 

18 buffers participating in the input/output operation. 

19 19. A computer as defined in claim 7 in which each frame is further defined as including a series of 

20 stripes, the series including data items each to be received by the series of processing elements 

21 participating in the input/output operation, each stripe including a predetermined number of data 

22 items to be received by the series of processing elements participating in the input/output operation 

23 for storage in their respective processing element receive buffers at successive destination offset 

24 values, said destination processing element address value generator further generating said 

25 destination processing element address values and destination offset values in response to the 

26 number of data items in each stripe. 

27 20. A computer as defined in claim 19 in which the destination processing element address and offset 

28 generator further generates during an initial iteration an initial destination processing element 

29 address value and an initial destination offset value both related to the number of input/output 

30 buffers, the number of processing elements participating in the input/output operation, the position 

31 of the input/output buffer among the input/output buffers participating in the input/output 

32 operation, and the number of data items in each stripe, the destination processing element address 

33 and offset generator during subsequent iterations generating a destination processing element 

34 address value and destination offset value in response to the initial destination processing element 

35 address value and an initial destination offset value 

36 21. A computer as defined in claim 19 in which said destination processing element address and 
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1 offset generator further generates said destination processing element address value in response to a 

2 base processing element address value identifying a predetermined one of the processing elements in 

3 the series of processing elements participating in the input/output operation. 

4 22. A computer as defined in claim 19 in which said destination processing element address and 

5 offset generator comprises: 

6 A. a destination offset value generator for, during successive iterations, generating 

7 destination offset values in response to an initial destination offset value, the number of input/output 

8 buffers, the number of processing elements participating in the input/output operation, and the 

9 number of data items in a stripe, said destination offset value for each iteration identifying the frame 

10 and the position of the data item to be used in a message during the iteration within the sequence of 

11 data items comprising its stripe, the destination offset value generator further generating a 

12 destination address control signal having selected conditions; and 

13 B. a destination processing element address value generator for, during successive iterations, 

14 generating destination processing element address values in response to an initial destination 

15 processing element address value, the number of input/output buffers, the number of processing 

16 elements participating in the input/output operation, and the condition of the destination address 

17 control signal, the destination processing element address value for each iteration identifying the 

18 stripe which contain the data item to be used in a message during the iteration within the sequence of 

19 stripes comprising a frame. 

20 23. A computer as defined in claim 22 in which said, destination processing element address value 

21 generator includes: 

22 A. a destination processing element address value store for storing a destination processing 

23 element address value; 

24 B. an address incrementation value store for storing an address incrementation value; and 

25 C a destination address value incrementation circuit for generating, during each iteration, 

26 an incremented destination processing element address value in response to the destination 

27 processing element address value stored in said destination processing element address value store, 

28 the address incrementation value, and the condition of the destination address control signal, the 

29 incremented destination processing element address value being stored in the destination processing 

30 element address value store as the destination processing element address value. 

31 24. A computer as defined in claim 23 in which the address incrementation value stored in the 

32 address incrementation value store is related to the number of processing elements and the number 

33 of input/output buffers participating in the input/output operation. 

34 25. A computer as defined in claim 23 wherein said destination processing element address value 

35 generator further includes an destination initialization circuit for enabling the destination processing 

36 element address value store to store an initial destination offset value both related to the number of 

37 input/output buffers, the number of processing elements participating in the input/output operation, 
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1 the position of the input/output buffer among the input/output buffers participating in the 

2 input/output operation, and the number of data items in a stripe. 

3 26. A computer as defined in claim 23 wherein said destination address value incrementation circuit 

4 further includes a destination processing element address value range limitation circuit for limiting 

5 incremented destination processing element address value to an address value range corresponding 

6 to the address values of the processing elements participating in the input/output operation. 

7 27. A computer as defined in claim 26 in which: 

8 A. said destination processing element address value incrementation circuit further includes: 

9 i. destination processing element address value range limitation store for storing a limitation 

10 value relating to an upper end of the address value range; and 

11 iL an address reset store for storing an address reset value; 

12 B. said destination processing element address value range limitation circuit includes: 

13 i. a selector circuit for selectively coupling either the address incrementation value from said 

14 address incrementation value store or the address reset value from said address reset store to said 

15 destination address value incrementation circuit in response to a selection control signal; and 

16 ii. a comparator for generating said selection control signal in response to the destination 

17 pressing element address value from said destination processing element address value store and 

18 the limitation value from said destination processing element address value range limitation store, 

19 the address reset value and the limitation value being selected to ensure that the incremented 

20 destination processing element address value generated by said destination address value 

21 incrementation circuit is within said address value ran^. 

22 28. A computer as defined in claim 22 in which said destination offset value generator includes: 

23 A. a destination offset base value generator for generating a destination base offset value 

24 during each iteration, said destination base offset value identifying the frame containing the data 

25 item to be used in a message during the iteration; 

26 B. a destination offset delta generator for generating a destination delta offset value during 

27 each iteration, the destination delta offset value identifying the position of the data item to be used 

28 in a message during the iteration within the sequence of data items comprising its stripe; and 

29 C a destination offset combination value generator for generating said destination offset 

30 value in response to said destination base offset value and said destination delta offset value. 

31 29. A computer as defined in claim 28 in which said destination offset base value generator - 

32 comprises: 

33 A. a destination base offset value store for storing a destination base offset value to be used 

34 by the destination offset combination value generator; 

35 b. a destination base offset value incrementation circuit for generating, during each 

36 iteration, an incremented destination base offset value in response to the destination base offset 

37 value stored in said destination base offset value store and a base offset incrementation value, the 
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1 incremented destination base offset value being stored in the destination base offset value store as 

2 the destination offset base value for use during the next iteration; 

3 C a base offset incrementation value circuit for providing a base offset incrementation 

4 value, said base, offset incrementation value circuit comprising: 

5 La base offset base incrementation value store for storing a base incrementation value; 

6 ii. base offset enhanced incrementation value store for storing an enhanced incrementation 

7 value reflecting the base incrementation value and the number of data items in a stripe; and 

8 iii. base offset incrementation value selector for selectively coupling one of the base 

9 incrementation value or the enhanced incrementation value as the base offset incrementation value 

10 in response to a slot signal; and 

11 D. a slot count circuit for maintaining a running count of the data item to be transmitted in a 

12 message during an iteration in a sequence of data items within a frame and generating the slot count 

13 signal in response to the running count and the number of data items within a frame. 

14 30. A computer as defined in claim 29 wherein said destination base offset value generator further 

15 includes a destination base offset initialization circuit for enabling the destination base offset value 

16 store to store an initial destination base offset value related to the number' of processing elements 

17 participating in the input/output operation, the position of the input/output buffer among the 

18 input/output buffers participating in the input/output operation, and the number of data items in a 

19 stripe. 

20 3 1. A computer as defined in claim 29 in which said slot count circuit comprises: 

21 A. a slot count store for storing a slot count value; 

22 B. a slot count incrementation value store for storing an slot count incrementation value; 

23 and 

24 C a slot count incrementation circuit for generating, during each Jte ration, an incremented 

25 slot count value in response to the slot count value stored in said slot count store and the slot count 

26 incrementation value, the incremented slot count value being stored in the slot count store as the slot 

27 count value to be used in the next iteration. 

28 32. A computer as defined in claim 31 in which the slot incrementation value stored in the slot count 

29 incrementation value store is related to the number of processing elements and the number of 

30 input/output buffers participating in the input/output operation, and the number of data items in a 

31 stripe. 

32 33. A computer as defined in claim 31 wherein said slot count circuit further includes a slot count 

33 initialization circuit for enabling the slot count store to store an initial slot count value related to the 

34 number of processing elements participating in the input/output operation, the position of the 

35 input/output buffer among the input/output buffers participating in the input/output operation, and 

36 the number of data items in a stripe. 

37 34. A computer as defined in claim 31 wherein said slot count circuit further includes a slot count 

SUBSTITUTE SHEET 



WO 93/04438 PCT/US92/06848 

-32- 

1 value range limitation circuit for limiting the incremented slot count value to a slot count value range 

2 corresponding to the number of data items in a frame. 

3 35. A computeras defined in claim 28 wherein said destination offset delta generator comprises: 

4 A. a destination delta offset value store for storing a destination delta offset value to be used 

5 by the destination offset combination value generator, 

6 B. a destination delta offset value incrementation circuit for generating, during each 

7 iteration, an incremented destination delta offset value in response to the destination delta offset 

8 value stored in said destination delta offset value store and a delta offset incrementation value, the 

9 incremented destination delta offset value being stored in the destination delta offset value store as 
10 the destination delta offset value for use during the next iteration; 

H c a delta offset incrementation value circuit for providing a base offset incrementation 

12 value, said base offset incrementation value circuit comprising: 

13 i. a delta offset base incrementation value store for storing a delta incrementation value; 

14 iL delta offset reduced incrementation value store for storing a reduced delta incrementation 

15 value reflecting the delta incrementation value and the number of data items in a stripe; and 

16 iii. a delta offset incrementation value selector for selectively coupling one of the delta 

17 incrementation value or the reduced delta incrementation value as the base offset incrementation 

18 value in response to the destination delta offset value and the number if data items in a stripe. 

19 36. A computer as defined in claim 35 wherein said destination delta offset value generator further 

20 includes a destination delta offset initialization circuit for enabling the destination delta offset value 

21 store to store an initial destination delta offset value related to the position of the input/output 

22 buffer among the input/output buffers participating in the input/output operation and the number of 

23 data items in a stripe. 

24 37. A computer as defined in claim 35 wherein said destination delta offset value generator further 

25 includes a destination delta offset value range limitation circuit for generating said destination 

26 address control signal and for limiting the incremented delta offset value to a delta offset value range 

27 corresponding to number of data items in a stripe. 

28 38. A computer comprising: 

29 A. a plurality of processing elements for performing processing operations in accordance 

30 with processing control messages to generate processed data in connection with data received in data 

31 messages and for generating data messages containing said processed data; 

32 B. a plurality of control processors for generating said processing control messages for 

33 controlling processing by said processing elements and for generating input/output control messages; 

34 Q an input/output processor responsive to input/output control messages from said control 

35 processors for initiating an input/output operation to transfer data in data messages with at least a 

36 selected subset of said processing elements; 

37 D. a routing network for transferring data messages between said processing elements and 
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1 said input/output processor and input/output control messages between said control processors and 

2 said input/output processor; and 

3 E. a control network for transferring said processing control messages between said control 

4 processors amtsaid processing elements, said control network being partitionable into a plurality of 

5 partitions each facilitating the transfer of processing control messages between at least one control 

6 processor and selected ones of said processing alements. 

7 39. An input/output processor including a plurality of input/output buffers connected to a series of 

8 leaf nodes of said routing network for generating messages for transfer over said routing network to 

9 a plurality of data receivers each connected to one of a second series of nodes of said routing 

10 network and identified by an address during an input/output operation, each input/output buffer 

11 including: 

12 A. a transmit buffer for buffering a plurality of data items each to be transmitted in a 

13 message to a data receiver in a message; and 

14 B. a destination data receiver address and offset generator for iterative ly generating a 

15 destination data receiver address value and a destination offset value in response to the number of 

16 input/output buffers and the number of data receivers participating in the input/output operation. 
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